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1.  Introduction 


Among  the  challenges  faced  in  molecular  simulations  is  balancing  the  competing 
needs  of  computational  complexity  and  fidelity  to  the  underlying  chemical  and 
physical  phenomena  being  studied.  The  challenge  is  made  more  acute  when  the 
properties  under  investigation  evolve  over  time,  as  many  orders  of  magnitude  ex¬ 
ist  between  the  fundamental  time  scale  of  molecular  motions  and  the  time  scale 
of  collective  material  processes  such  as  heat  transfer,  diffusion,  or  elastic  defor¬ 
mations.  Coarse-graining  methodologies  have  been  frequently  used  to  reduce  the 
complexity  of  molecular  dynamics  (MD)  simulations,  by  reducing  the  number  of 
“particles”  being  studied  1-5.  Although  the  elimination  of  degrees  of  freedom  (DoFs) 
often  leads  to  minor  gains  in  the  time  steps  that  can  be  used  in  dynamic  simula¬ 
tions6,  such  improvements  have  not  led  to  significant  breakthroughs  in  simulation 
capabilities.  More  advanced  coarse-graining  techniques  are  required  to  enable  sim¬ 
ulations  of  systems  exhibiting  structure  on  the  micron  or  submillimeter  scales,  such 
as  semicrystalline  materials. 

In  this  report,  we  present  a  new  approach  toward  that  end,  focusing  on  polymeric 
materials  composed  of  homopolymers  or  block  copolymers.  Our  approach  is  based 
on  the  concept  of  diffusion  wavelets7,8,  which  both  automatically  identifies  chem¬ 
ical  structures  that  can  be  reduced  into  coarse-grained  (CG)  units  and  also  allows 
for  repeated  application,  thus  providing  much  greater  customization  of  the  level  of 
simulation.  Such  a  technique  enables  the  potential  reduction  of  the  number  of  DoFs 
by  orders  of  magnitude,  as  well  as  for  a  vastly  increased  time  step,  which,  when 
taken  together,  permits  vastly  extended  time  scales  to  be  simulated  using  currently 
available  computational  resources. 

Among  the  shortcomings  of  typical  coarse-graining  approaches  has  been  that  they 
normally  involve  only  2  levels  of  description,  the  “original”  atomistic  level  and  the 
coarse  representation,  which  has  typically  been  based  on  the  developer’s  judgment 
or  intuition.  These  techniques  rely  on  partitionings  of  the  atoms  as  the  foundation 
to  deriving  coarse  DoFs,  which  typically  replace  the  atomistic  groupings  with  a  sin¬ 
gle  “bead”-like  entities9-15.  While  approaches  such  as  the  force-matching  method 
of  Voth  et  al. 16  and  the  reversible  coarse-graining  method  for  phenolic  polymers 
by  Kremer  et  al. 9,10,17  have  greatly  impacted  accessible  simulation  scales  for  these 
materials,  they  do  not  fundamentally  change  the  underlying  computational  process. 
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Furthermore,  these  CG  approaches  introduce  the  difficult  problem  of  generating 
consistent  atomistic  reconstructions,  because  particle-like  CG  beads  lose  informa¬ 
tion  about  the  particles  they  subsume,  which  has  to  be  recaptured  by  other  means 
(e.g.,  the  use  of  dummy  variables  as  in  dissipative-particle  dynamics  with  energy 
conservation18). 

The  present  work  focuses  on  polymeric  materials,  which  are  the  target  application 
for  many  classes  of  coarse-graining  methodologies.  In  particular,  polymers’  long 
chain  lengths  and  low  defect  concentrations  compound  the  problems  of  2- scale  rep¬ 
resentations  and  fine-scale  reconstruction,  by  requiring  a  large  number  of  particles 
and  their  associated  DoFs.  Moreover,  the  equilibration  time  for  polymeric  liquids 
is  itself  computationally  challenging:  a  melt  whose  chains  contain  N  beads  each 
will  require  0(N3)  time  to  equilibrate,  making  simulation  essentially  impossible 
without  advanced  simulation  approaches19.  However,  coarse-graining  need  not  be 
restricted  only  to  polymers;  even  relatively  “simple”  fluids,  such  as  water,  have  been 
the  subject  of  CG  models. 

Many  of  the  previously  mentioned  problems  can  be  alleviated  through  the  selec¬ 
tion  of  an  appropriate  basis  set  for  describing  the  internal  structure  of  individual 
molecules.  Consider  the  analogy  of  a  time-varying  quantity  f(t).  One  naturally 
represents  f(t )  in  terms  of  an  infinitely  local  basis  (Dirac  delta  functions).  A  more 
sophisticated  approach,  suitable  for  analyzing  certain  average  properties  of  the  sig¬ 
nal,  might  employ  the  Fourier  basis,  sines  of  varying  frequency.  However,  the  latter 
basis  is  global  in  nature,  challenging  processing  and  storage  for  signals  that  are 
infrequent  or  have  sharply  varying  features  only  over  small  durations  in  time.  In 
signal  processing,  wavelets  are  often  used  as  a  basis  to  differentiate  between  local 
and  increasingly  global  features  of  the  signal,  because  wavelet  bases  can  be  flexibly 
defined  to  efficiently  capture  features  of  varying  localization20-21.  Few  other  mod¬ 
els  provide  the  on-the-fly  adaptivity  required  for  important  problems,  which  in  the 
structural  modeling  sense  might  be  problems  including  crack  initiation,  crack  prop¬ 
agation,  and  interfacial  phenomena.22  24 .  By  analogy  to  the  time-varying  signal, 
purely  atomistic  models  and  standard  2-scale  CG  models  represent  infinitely  local 
Dirac  distributions,  which  are  expensive;  coarse-graining  methods  based  on  glob¬ 
ally  periodic  functions  (e.g.,  plane  waves)  are  inefficient  for  modeling  localized 
properties.  In  the  context  of  time-varying  signals,  the  mathematical  inefficiency  can 
be  quantified  precisely:  except  for  limited  special  cases,  either  basis  requires  in- 
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finitely  many  coefficients  to  approximate  a  signal  to  within  tolerance  of  a  given 
error  metric  (e.g.,  the  C  or  metrics).  In  contrast,  the  number  of  coefficients  re¬ 
quired  for  a  wavelet-based  representation  is  usually  0(log2  N),  where  N  is  the  size 
of  the  signal  (e.g.,  number  of  time  samples). 

Wavelet  ideas  have  already  been  used  extensively  to  analyze  time-series  data21. 
Our  application  of  wavelet  ideas  to  structural  representation  extends  the  work  of 
Ismail25-26.  In  particular,  the  earlier  work  employed  Monte  Carlo  and  did  not  cap¬ 
ture  dynamical  information.  The  work  here  develops  an  approach  suitable  for  MD 
simulation  by  interpreting  the  wavelet  transform  of  Ismail  in  terms  of  the  equations 
of  motion.  The  method  provides  a  consistent  and  systematic  framework  to  derive 
multiple  levels  of  model  resolution  while  also  reducing  simulation  complexity. 

This  approach  has  numerous  advantages,  whose  theoretical  basis  we  address  in  this 
report;  follow-on  work  will  illustrate  more  concrete  applications.  The  general  theme 
of  the  advantages  is  that  for  the  dynamical  and  nonequilibrium  metrics  of  interest, 
this  approach  especially  captures  molecular  information  relevant  to  both  thermody¬ 
namics  and  kinetics.  In  particular,  application  of  diffusion  wavelets  to  the  chemical 
topology  of  the  molecule  leads  to  the  identification  of  what  we  call  collective  action 
modes  (CAMs)  that  represent  coordinated  motions  within  the  molecule  at  various 
length  and  time  scales.  Our  approach  is  rigorously  tied  to  the  underlying  physics 
and  has  the  potential  to  increase  simulation  size  and  duration  by  several  orders  of 
magnitude.  Moreover,  the  approach  is  agnostic  to  the  kind  of  material  being  studied 
and  can  be  applied  both  to  structures  of  arbitrary  chemical  complexity,  including 
both  relatively  simple  molecules  such  as  water  or  benzene,  as  well  as  more  compli¬ 
cated  chain  molecules  such  as  polymers  and  biopolymers.  Finally,  it  is  also  capable 
of  capturing  minor  effects  such  as  mass  effects  from  chemical  substitution  (e.g., 
partial  deuteration  or  fluorination).  This  may  be  of  special  importance  for  materials 
design,  where  the  task  is  to  link  macroscopic  behavior  (e.g.,  Young’s  modulus)  to 
the  atomic  structure  of  the  monomer  unit. 

We  proceed  by  introducing  the  type  of  classical  Hamiltonian  model  we  seek  to 
accelerate,  as  well  as  the  wavelet  methodology  and  its  basic  properties.  We  then 
discuss  how  wavelets  can  be  applied  to  uncover  the  CAMs  for  a  given  molecule 
and  how  to  use  CAMs  to  systematically  reconstruct  finer  resolutions,  as  well  as 
deriving  mixed  resolutions.  We  present  several  examples  showing  the  application 
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of  the  method  for  small  molecules  such  as  water  and  hydrogen  cyanide  (HCN) 
before  summarizing  our  conclusions. 

2.  Methodology 

We  begin  by  analyzing  the  equations  of  motion  in  MD  for  a  physics-based  wavelet 
construction.  The  foundations  of  MD  lie  in  the  application  of  Newtonian  mechanics 
to  the  energy  functional 

E  =  ^tr(xTMx)  +  V(x),  (1) 

where  x  E  MiVx3  are  particle  positions,  x  E  M7Vx3  are  particle  velocities,  N  is 
the  number  of  particles,  M  is  the  diagonal  matrix  of  particle  masses,  and  V  is 
the  potential.  For  the  macromolecular  systems  we  are  interested  in,  V  is  usually 
partitioned  as 


V 


^harmonic  T  knon— harmonic; 


(2) 


where  Charmonic(x)  =  Kij(\\xi  ~  xj II  ~  r§3))2/2,  Kij  is  the  force  constant  of 
the  harmonic  oscillator,  Xk  is  the  position  of  particle  k,  and  is  the  equilibrium 
distance  of  particles  i  and  j.  The  atoms  and  the  bonds  between  them  define  a  graph 
in  which  the  atoms  are  the  vertices  and  an  edge  between  atoms  i  and  j  has  weight 
Kjj.  The  maximum  number  of  edges  for  a  vertex  in  organic  materials  is  4,  and 
even  in  organometallic  complexes,  the  number  of  edges  is  unlikely  to  exceed  6  (the 
typical  maximum  coordination  number).  Consequently,  the  matrix  representation 
of  this  graph,  defined  by  K,  should  be  highly  sparse. 


Our  multiresolution  approach  begins  with  the  graph  Laplacian  of  the  weighted 
graph  defined  using  particles  as  vertices  and  the  bonds  as  edges  weighted  by  the 
harmonic  force  constant.  This  graph  Laplacian,  denoted  A,  is  here  a  matrix  that  can 
be  defined  as 


A  :=  diag(JTl)  —  K, 

(3) 

A.  >  ./ 

\  ~Kij,  >  j 

(4) 

where  1  denotes  a  vector  of  all  ones  and  diag(t')  denotes  a  diagonal  matrix  defined 
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in  terms  of  a  vector  v,  so  that  if  A  =  diag(t')  then  Alt  =  vt .  An  example  of  the 
weighted  graph  Laplacian  A  for  a  linear  triatomic  molecule  can  be  found  in  Figure 
1. 

Using  Eq.  1  and  our  definition  of  A,  we  derive  the  equations  of  motion 

Mx  =  —  VU(x)  =  —Ax  —  V'(x),  (5) 


where  V'(x)  =  W(.x)  —  Ax  is  the  force  for  x  not  due  to  A.  A  is  the  graph 
Laplacian  of  the  weighted  graph  with  particles  as  vertices  and  the  bonds  as  edges 
weighted  by  the  harmonic  force  constant  (Figure  1). 


K 12  =  1.3 

1  wmrrrrmrrr  2 


A23  =  2.1  — y  A  = 


1 

2 

3 

1 
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Fig.  1  Example  of  a  weighted  graph  Laplacian  derived  from  a  simple  weighted  graph 


By  letting  r  =  Ml/2x,  p  —  r,  and  q  =  A  1//2M1/2x  where  A  =  M-1/,2AM_1/2, 
the  harmonic  energy  term  can  be  expressed  as  (||p||2  +  \\q\\2)  / 2  and  the  equations 
of  motion  become 


p  =  —  A^q  —  V'(x)  =  —  A^q  —  V(r),  (6) 

where  V (r)  =  V'(M~ 1  2r)  is  the  effective  potential  in  terms  of  r.  If  U  =  0,  the 
system  can  be  solved  analytically, 


y{t)  =  eTft/0, 


(7) 


where 


T  = 


and  y0 


(8) 


It  can  be  shown  that  A  is  positive  semi-definite,  and  that  the  eigenvectors  of  the 
exponential  operator  in  Eq.  7  are  (Uj,±iUj),  where  Uj  is  an  eigenvector  of  A. 
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The  solutions  to  Eq.  7  oscillate  with  frequencies  ±c Oj,  respectively,  where  u?  is 
the  eigenvalue  of  A  corresponding  to  the  eigenvector  Uj.  For  this  analytical  case, 
any  eigensolution  of  T  can  be  propagated  through  time  independently  of  any  other 
solution.  Unfortunately,  this  simplicity  is  in  general  broken  by  V,  which  nonlin- 
early  couples  all  of  the  eigensolutions  to  one  another.  As  a  consequence,  all  solu¬ 
tions  have  to  be  simulated  concurrently  according  to  the  highest  frequency  associ¬ 
ated  with  an  eigenvector.  Hence,  the  transformation  from  particle  space  coordinates 
(x,xY  to  the  harmonic  solution  coordinates  offers  no  advantage.  To  circumvent 
this  issue,  a  basis  that  isolates  the  coupling  effects  from  high-  and  low-frequency 
components  is  needed. 

The  key  motivation  for  our  work  is  the  recognition  that  the  mass -weighted  graph 
Laplacian  A  relates  spatial  coordinates  to  temporal  frequencies,  which  suggests  that 
its  eigenvector  matrix  is  a  promising  basis  for  compression.  The  weighted  graph 
Laplacian  A  and  its  matrix  of  eigenvectors  are  then  analogous  to  the  Laplace  oper¬ 
ator  and  the  Fourier  transform,  respectively,  in  conventional  wavelet  theory. 

2.1  Basic  Multiresolution 

Here  we  introduce  the  wavelet  transform  used  in  this  work  and  its  derivation.  We 
use  the  multiresolution  analysis  for  diffusion  wavelets  as  introduced  by  Coifman 
and  Maggioni7.  In  essence,  the  multiresolution  decomposition  partitions  the  eigen¬ 
values  and  eigenvectors  of  A,  effectively  strongly  coupling  high  frequencies  in  the 
time  domain  to  high-frequency  eigenvectors  of  A  in  the  “particle”  domain.  This 
is  an  important  point  for  the  applicability  of  our  approach:  not  only  can  DoFs  be 
reduced,  but  the  time  step  may  also  be  increased  considerably,  approximately  by  a 
factor  of  2  at  each  subsequent  resolution. 

The  multiresolution  scheme  (Figure  2)  relies  on  a  positive-definite  low-pass  filter 
T  with  UTlIoo  =  1  (i.e.,  an  operator  that  suppresses  high-frequency  vectors)  and  an 
accuracy  operator  P£  that  projects  eigenvectors  of  a  matrix  X  6  span{T2  n  <E  N} 
with  associated  eigenvalue  less  than  a  given  accuracy  e  >  0  to  zero.  The  effect  and 
purpose  of  the  filter  is  to  project  out  DoFs  associated  with  high  frequencies,  thereby 
producing  a  hierarchy  of  CAMs  in  respective  vector  spaces.  After  each  application, 
the  associated  frequencies  roughly  halve,  and  concomittantly,  the  minimum  time 
step  size  roughly  doubles. 
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Fig.  2  Dyadic  trees  generated  by  multiresolution  formalism  using  a  filter  T.  Shaded  spaces  are 
final  subspaces  of  the  multiresolution.  All  other  spaces  are  intermediates  of  the  construction, 
a)  Generic  scheme;  and  b)  an  example  using  butane.  In  the  final  spaces  in  gray,  red  and  blue 
denote  opposite  signs  of  weights  in  the  construction  from  the  finer  scale. 
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The  recursively  defined  vector  spaces  Vn  =  Vn+i  ©  Wn+1,  where 

K+1  =  Pe{T2n+1)Vn  (9) 

and 

H4+1  =kerP£(T2'n'+1|v.J  (10) 

are  iteratively  associated  with  orthonormal  bases  via  QR-decompositions, 

T2"  =  QnRn,  (11) 

where  Qn  is  unitary,  Rn  is  upper  triangular,  and  both  are  dependent  on  the  basis 
used  to  express  T 2" . 

One  advantage  of  this  approach  is  the  inherent  invariance  of  the  wavelet  spaces  with 
respect  to  permutations  of  the  graph  vertices  and  its  ability  to  deal  with  arbitrary 
matrices  T  with  ||T||oo  <  1.  The  wavelet  space  at  scale  n  (Wn)  is  approximately 
spanned  by  eigenvectors  of  T  whose  eigenvalues  obey 

2-n+1  In  £  <  In  A  <  2~n  In  e.  (12) 

Only  a  limited  degree  of  mixing  with  eigenvectors  associated  with  eigenvalues  out¬ 
side  of  these  bounds  is  possible  (see  Section  9  for  details).  Since  T  is  positive  semi- 
definite,  T  can  be  rewritten  as  an  exponential  elnT.  The  application  of  the  previous 
procedures  are  equivalent  to  doubling  the  spectrum  of  In  T  followed  by  projecting 
out  the  high-frequency  components  of  the  spectrum  of  In  T. 

2.1 .1  Properties  of  A  as  a  Filter 

With  an  efficient  means  of  computing  the  wavelet  transform  in  hand,  we  consider 
exclusively  in  the  following  the  low-pass  filter  T  —  I  —  A/C,  where  C  is  a  constant 
sufficiently  large  to  render  T  positive  semi-definite.  To  minimize  the  number  of 
matrix-matrix  multiplications,  in  particular,  with  ker  Pe(T2")|yn  =  0,  the  normal¬ 
ization  constant  C  =  HAH^  would  generally  be  optimal  as  4  0-  Eigenvalues 
for  graph  Laplacians,  such  as  A,  are  known  to  lie  in  [0,  2  max*  A,;,].27  Hence,  C  is 
chosen  to  be  between  max,  A„  and  2  max,;  A,,  for  all  numerical  examples. 

As  implied  by  Eq.  12,  the  frequency  range  A  In  A  =  2~n  In  £  for  each  wavelet  space 
Wn  is  drawn  ever  tighter  with  each  iteration  while  the  eigenvalues  A  approach  1. 
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Thus  generally,  more  CAMs  correspond  to  scales  of  high  frequency  than  scales  of 
low  frequency.  As  a  result,  each  successive  scale  corresponds  to  an  increase  in  the 
minimum  time  step  size  of  that  scale  in  MD  simulations  of  approximately  £~2 "  as 
well  as  a  significant  reduction  of  CAMs.  Furthermore,  unless  the  effective  potential 
V  couples  involved  scales  strongly,  sufficiently  coarse  scales  (n  m )  are  quasi¬ 
static  compared  to  a  given  scale  m,  while  sufficiently  fine  scales  (n  <C  m )  only 
influence  the  target  scale  via  their  static  mean.  Therefore,  only  the  relevant  scales 
need  to  be  propagated  through  time,  reducing  the  DoFs  and  allowing  us  to  raise  the 
time  step  size  to  match  the  scales. 

Assuming  that  eigenvalues  of  T  are  distributed  approximately  exponentially,  e  = 
e-1/2  would  lead  to  log2  N  scales,  where  N  is  the  dimensionality  of  A.  This  leads 
to  many  DoFs  per  scale  due  to  issues  discussed  in  the  following  sections.  Instead, 
a  higher  resolution  of  e^achine  *s  used-  Although  this  wastes  some  computation  on 
the  first  few  iterations  because  ker  PJT2  \Vn  =  0,  it  is  equivalent  to  choosing  a 
tolerance  on  the  scale  of  drnax,  ^effective  =  ^LThine  ~  1  * 

2.1.2  How  Molecular  Information  Influences  A 

Since  the  filter  T  and  the  weighted  graph  Laplacian  A  derived  from  the  MD  po¬ 
tentials  share  the  same  eigenvectors  albeit  with  reversed  order  of  eigenvalues,  we 
discuss  its  properties  in  greater  detail.  These  properties  have  a  major  impact  on  the 
performance  of  the  wavelet  transform.  In  the  following,  we  discuss  shortly  the  con¬ 
ditions  under  which  separate  groups  of  atoms  are  strictly  independent  of  each  other 
leading  to  highly  localized  CAMs. 


Due  to  the  small  number  of  bonds  an  atom  generally  has,  vertices  are  also  gener¬ 
ally  of  low  degree.  As  a  result,  there  are  highly  localized  modes  due  to  invariant 
subspaces  of  chemical  motifs.  For  example,  any  methylene  (CH2)  group  has  an  as¬ 
sociated  medium-frequency,  highly  localized  eigenvector  of  A.  This  follows  from 
the  fact  that  the  hydrogens  are  leaves  on  the  graph  (i.e.,  the  weighted  subgraph 
Laplacian 


A  ch2  = 


+  o 


(  2KCh 
12 

Kch 

s/12 

Kqh 


VVl 


_ Kqh 

s/12 

KCh 

0 


Kch  \ 
\/~12 

0 


K 


CH 


(13) 


where  o  collects  the  contributions  from  further  bonds  with  the  carbon  atom,  shows 
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only  contributions  from  CH2  for  the  hydrogens).  If  the  hydrogens  are  identified  with 
indices  i  and  j  on  the  full  graph  Laplacian,  then  (e,  —  e:] )  /  a/2  is  an  eigenvector  with 
an  associated  frequency  a / KCh / m#  of  a  CH  vibration. 

A  slightly  more  involved  example  that  also  shows  that  this  phenomenon  is  not  re¬ 
stricted  to  leaves  on  the  graph  is  the  repeat  unit  of  the  energetic  polymer  poly- 
1,  2,4, 5-tetrazine  (Figure  3),  which  has  an  invariant  subspace  spanned  by  2  indepen¬ 
dent  eigenvectors.  The  repeat  unit  block  of  the  weighted  graph  Laplacian  A  is 
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0 


\ 


X 

a/12-14 


a/12-14 

X+Y  X 

14  a/12-14 

X  2X+Ze 

a/12-14  12  / 


(14) 


where  X  is  the  bond  constant  for  the  C-N  bond,  Y  for  the  N-N  bond,  and  Zu6  for 
the  contributions  of  vertices  outside  the  repeat  unit.  The  2  vectors  (0, 1, 1,  —1,  —1,  0) 
and  (0, 1,  —1, 1,  —1,  0)  span  a  subspace  invariant  under  application  of  A  but  which 
is  mapped  to  zero  for  operators  B  with  entries  BtJ  ^  0  i/j  /  (2,  3, 4,  5}. 

Hence,  these  vectors  are  highly  localized  (see  Section  10  for  details  of  the  general 
case). 


Fig.  3  Chemical  structure  of  polytetrazine 


In  large  linear  homopolymers,  discussed  in  later  sections,  the  invariant  subspaces 
represent  highly  degenerate  eigenvalues  due  to  the  polymer’s  repetitive  structure. 
Degeneracies  in  A  reduce  the  effectiveness  of  the  wavelets  because  no  choice  of 
accuracy  can  be  used  to  separate  them  into  subscales.  In  such  cases,  it  may  be 
possible  to  incorporate  more  information  from  the  potential  V,  but  this  is  outside  of 
the  current  scope. 

As  an  example,  butane  (H(CH2)4H)  has  14  atoms  and,  therefore,  the  graph  Lapla- 
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cian  associated  with  H(CH2)  iH  is  the  14-dimensional  square  matrix 


A  oc 


—  K 


2  +  2k  -1  -1 

-1  1 

-1  1 

—  K, 

- Kj 

3  +  K  -1  -1 

-1 

-1  1 

-1  1 

-1 

(15) 


where  k  =  44/31  is  the  ratio  of  force  constants  for  a  CC-bond  to  a  CH-bond  in 
the  polymer-consistent  force  field  (PCFF)28  and  the  units  of  kcal/mol/A2  have  been 
subsumed  in  the  proportionality  constant.  The  individual  CFF  repeat  units  have 
been  boxed  for  emphasis.  Since  660  kcal/mol/A2  is  an  upper  bound  for  A,  using 
the  filter  T  =  I  —  A/ (660  kcal/mol/A2)  and  cmathmc  as  the  cutoff  criterion  yields  6 
scales.  The  first  4  applications  (T,  T2,  T4,  T8,  T16)  did  not  lead  to  any  unit  vectors 
below  the  threshold.  At  n  —  5,  the  4  highest  frequency  modes  of  A  (A  =  517.1, 
524.0,  556.3,  and  556.3  kcal/mol/A2,  respectively), 
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-0.46  0.50  0.50  0.50/ 

(16) 


approximately  map  to  zero  for  T32,  that  is,  the  logarithm  of  the  expectation  values 
with  T  do  not  exceed  2-5  lne  ~  —1.13  (log(l  —  A/660)  <  —1.53)  .  These  CAMs 
correspond  to  the  symmetric  stretches  of  the  methyl  groups. 


The  6  second-highest  frequencies  (A  =440  kcal/mol/A2  for  all,  log(l  —  A/660)  ~ 
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-1.10), 
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are  computed  for  n  —  6  and  their  expectations  with  T  are  between  2-5  In  £  «= 
—1.13  and  2-6  In  e  ~=  —0.56.  These  CAMs  correspond  to  individual  HH  stretches. 
The  CAMs  on  the  first  and  last  2  rows  cover  the  degeneracy  between  the  3  hydro¬ 
gens  within  the  respective  methyl  groups,  whereas  the  third  and  fourth  rows  show 
isolated  HH  vibration  modes. 


The  next  2  powers  of  T  (T128^t256)  do  not  filter  out  any  new  spaces.  The  third 
wavelet  subspace  is  spanned  by  a  single  vector  (A  =72.7  kcal/mol/A2,  log(l  — 
A/660)  «  -0.12), 
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with  exponent  n  =  9  for  T2" .  The  logarithm  of  its  expectation  value  with  T  is  be¬ 
tween  2-6ln£  ps=  —0.56  and  2-9ln£  ?a=  —0.070.  This  CAM  isolates  the  asym¬ 
metric  stretch  between  the  terminal  carbons  and  the  centers  of  mass  of  the  bridging 
methylene  groups. 

The  fourth  wavelet  subspace  is  spanned  by  another  vector  (A  =42.1  kcal/mol/A2, 
log(l  -  A/660)  «  -0.066), 
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with  exponent  n  =  10  for  T2"  =  T1024.  The  logarithm  of  its  expectation  value  with 
T  is  between  2-9ln£  «=  —0.070  and  2-10lne  ~=  —0.035.  This  CAM  captures 
the  symmetric  stretch  between  the  center  of  mass  of  the  bridging  methylenes  and 
the  centers  of  mass  of  the  terminal  methyl  groups. 
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The  last  nontrivial  wavelet  subspace  also  is  spanned  by  a  single  vector  (A  =  12.2 
kcal/mol/A2,  log(l  —  A/660)  ~  —0.019), 
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with  exponent  n  =  11  for  T2"  =  T2048.  The  logarithm  of  its  expectation  value 
with  T  is  between  2_10lne  —0.035  and  2_11ln£:  «=  —0.018.  This  CAM 

captures  the  symmetric  stretch  of  the  bridging  carbons  and  the  centers  of  mass  of 
the  terminal  methyl  groups. 

The  coarsest  level  (0  kcal/mol/A2)  is  described  by 
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This  last  CAM  is  just  the  center  of  mass. 

2.1.3  Multiresolution  Construction  from  Fragments 

Since  the  previously  mentioned  invariant  subspaces  are  inherent  to  molecular  frag¬ 
ments  and  some  molecular  fragments  are  particularly  common,  the  question  of  how 
much  can  be  gained  by  precomputing  the  internal  scales  of  these  fragments  arises. 
For  example,  proteins  are  long  heterogeneous  polymers,  but  they  are  mostly  com¬ 
posed  of  only  20  amino  acids.  Hence,  it  is  instructive  to  characterize  the  effects  of 
modifying  bonds  or  substituting  different  atoms.  Indeed,  connecting  fragments  (see 
Section  1 1  for  details)  affect  a  small  portion  of  precomputed  CAMs,  such  that  only 
a  few  CAMs  need  to  be  adjusted.  Precomputed  fragments  can  therefore  speed  up 
the  computation  of  scales  considerably  when  the  perturbations  due  to  connecting 
them  are  relatively  small. 

2.1.4  Example:  Linear  Homopolymers 

Linear  homopolymers  are  an  important  class  of  materials,  whose  graph  Laplacians 
exhibit  convenient  structures  that  we  exploit  in  the  following  to  derive  their  CAMs. 
Linear  homopolymers  are  a  successive  addition  of  edges  between  identical  build¬ 
ing  blocks.  We  can  derive  computationally  inexpensive  algorithms  to  compute  the 
eigenvalues  of  their  weighted  graph  Laplacians  and  thereby  the  successive  construc¬ 
tion  of  the  respective  wavelet  spaces.  The  eigensy stems  are  computed  by  exploiting 
the  recursive  structure  of  A  to  solve  n  much  smaller  eigensystems,  where  n  is  the 
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number  of  repeat  units  in  a  single  chain  of  the  polymer.  The  graph  Laplacian  of 
linear  homopolymers  can  be  ordered  to  have  a  block  Toeplitz  structure,  where  each 
nonterminal  block  is  a  constant  m  x  m  matrix  for  the  off-diagonal  B  and  diagonal 
A,  respectively, 


( A  B* 

0 

B ^ 

B 

0 

0 

B* 

kb*  0 

B 

AJ 

where  m  is  the  number  of  particle  DoFs  in  the  repeat  unit  of  the  polymer.  Further¬ 
more,  the  off-diagonal  block  B  generally  consists  of  a  single  nonzero  entry  repre¬ 
senting  a  single  bond  connecting  repeat  units,  such  as  Bim  (off-diagonal  blocks  in 
Eq.  15  for  an  example).  This  highly  repetitive  structure  can  be  exploited  to  compute 
eigenvectors  and  eigenvalues  very  efficiently  and  hence  the  CAMs  (Section  12). 

2.2  Computational  Methods 

The  following  simulation  exemplifies  the  effectiveness  of  the  approach.  All  MD 
simulations  were  performed  in  the  the  Large-scale  Atomistic/Molecular  Massively 
Parallel  Simulator  (LAMMPS)29.  A  crystalline  model  of  polyethylene  (PE)  consist¬ 
ing  of  100,800  atoms  in  a  single  infinite  chain  was  simulated  in  the  NVT  ensemble 
under  periodic  boundary  conditions  with  1-fs  time  steps,  PCFF28  at  500  K.  The 
monoclinic  unit  cell  has  edge  vectors  a  =  (103.432 , 0,  0),  b  =  (0, 147.87 , 0),  and 
c  =  (73.88 , 0, 50.78 ).  The  initial  structure  is  depicted  in  Figure  5. 

2.3  Simulation  Results  and  Analysis 

We  witnessed  this  behavior  in  computations  of  PE  as  well  as  polyglucose.  Fur¬ 
thermore,  when  applied  to  the  alanine  dipeptide  model,  commonly  used  for  testing 
coarse-graining  methods  for  proteins,  the  wavelet  decomposition  correctly  identi¬ 
fied  the  various  chemical  groups  at  the  finer  scales,  while  the  coarse  scales  captured 
the  partitioning  corresponding  to  single  bond  rotations.  Figure  4  illustrates  the  de¬ 
composition  for  tetra-l,4-D-glucose  and  Figure  5  demonstrates  the  informational 
content  in  the  wavelet  DoFs  for  PE. 
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Fig.  4  Heat  map  of  selected  wavelet  DoFs  in  1,4-D-glucose  tetramer.  Blue  denotes  positive 
coefficients,  red  negative.  From  left  to  right:  fine-grained  wavelet  DoF  covering  a  single  repeat 
unit;  CG  wavelet  DoF  covering  one  half  of  the  oligomer;  coarser  wavelet  DoF  covering  the 
oligomer  isolating  repeat  units  with  sign  changes;  coarsest  wavelet. 


Fig.  5  Selected  reduced  wavelet  representations  of  a  PE  crystal.  From  a)  to  d)  wavelet  infor¬ 
mation  is  successively  added,  a)  Coarse  representation  highlights  the  anisotropy  in  1-D  chain 
averaged  over  all  chain  segments;  b)  Coarse  representation  of  the  2-D  chain  plane;  c)  3-D 
representation;  d)  full  resolution. 


Figure  6  shows  the  superimposed  Fourier-transformed  time  series  of  1,000  atoms 
from  the  500  K  trajectory,  which  due  to  the  high  temperature  shows  the  highest 
mobility  of  atoms.  Although,  the  zero  frequency  is  by  far  the  most  intense  signal 
(and  is  omitted  from  the  figure  for  clarity),  a  wide  range  of  other  frequencies  are 
active,  most  importantly  around  0.45  PHz,  which  limits  the  time  step  of  atomistic 
simulations  of  PE  to  1.2  fs  or  less. 
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Fig.  6  Fourier  transform  of  the  y-component  of  1,000  atoms  in  crystalline  PE  (100,800  atoms) 
excluding  zero  frequency  to  allow  detail  at  other  frequencies.  MD  at  500  K  and  1  atm.  Left: 
Individual  power  spectra  per  atom.  Right:  Power  spectrum  of  magnitude  of  optimal  represen¬ 
tation. 


On  the  other  hand,  Figure  7  shows  the  effects  of  scales  on  time-series  analysis.  The 
finest  scale  out  of  25  still  retains  the  high-frequency  components  (top,  left)  as  may 
be  expected,  but  they  are  much  less  intense  than  the  remaining  modes.  Traversing 
the  scales,  it  is  noteworthy  that  a  decreasing  number  of  DoFs  at  the  coarser  scales 
show  significant  peaks  at  all.  At  the  medium  scale  (top,  right),  no  high-frequency 
components  are  found  anymore.  Therefore,  medium  scale  and  coarser  DoFs  are 
quasi-static  compared  to  the  finer  scales.  Furthermore,  the  signals  are  clearly  sepa¬ 
rated  and  very  sharp  despite  the  high  temperature,  which  speaks  for  strongly  decou¬ 
pled  modes  and  justifies  dropping  the  finer  scales,  which  in  turn  facilitates  speed-up 
by  reducing  not  only  DoFs  but  also  increasing  the  propagation  time  step. 


0.2  0.3  0.4  o.s 

co/PHz 


Fig.  7  Top  row:  Fourier  transform  of  the  y-component  of  a  100,800  atom  crystalline  PE  sam¬ 
pled  at  1  fs.  3  scales  are  shown:  1st  (left,  finest  scale),  5th  (middle),  and  12th  scale  (out  of  25, 
right). 


The  same  is  witnessed  for  alanine  dipeptide  and  polyglucose  (not  shown).  Figure 
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8  shows  the  power  spectra  for  all  22  DoFs.  Clearly,  a  large  number  of  frequencies 
are  active  for  all  atoms.  The  spectrum  of  optimal  representation  shows  some  clear 
peaks  around  0.13  and  0.10  PHz.  But  in  Figure  9,  the  0.13-PHz  peak  is  absent  after 
the  second  scale  and  the  0.10-PHz  peak  including  noise  down  to  about  0.06  PHz 
vanishes  from  the  fourth  scale  on  through  the  remaining  4  scales. 


Fig.  8  Fourier  transform  of  the  z-component  of  alanine  dipeptide  in  vacuum  excluding  zero 
frequency  to  allow  detail  at  other  frequencies.  MD  at  500  K  and  1  atm.  Left:  Individual  power 
spectra  per  atom.  Right:  Power  spectrum  of  magnitude  of  optimal  representation. 
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Fig.  9  Fourier  transforms  of  alanine  dipeptide  DoF  time  series  in  vacuum  excluding  zero 
frequency  to  allow  detail  at  other  frequencies.  MD  at  500  K  and  1  atm.  a)  z-component  of  opti¬ 
mal  representation;  b)  second  finest  optimal  wavelet  DoF  z-component;  c)  third  finest  optimal 
wavelet  DoF  z-component;  d)  fourth  finest  optimal  wavelet  DoF  z-component. 


We  investigated  the  correlation  matrices  of  the  various  coordinate  systems  to  quan¬ 
tify  the  extent  of  interdependence.  Figure  10  shows  the  correlation  matrices, 

0  _ f  Xi(t)xj(s-t)  dsdt _ 

(f  xi(t)xi(s  ~  t)  ds  dt  J  x*(t)xj(s  —  t )  ds  dt ) 2 

where  Xi(t)  is  a  time-dependent  coordinate,  as  heat  maps.  As  may  be  expected,  all 
particle  coordinates  are  strongly  correlated,  but  in  both  the  water  case  as  well  as 
the  alanine  dipeptide  case,  much  less  correlation  is  witnessed  for  the  wavelet  DoFs 
(Figure  10,  right),  despite  the  few  DoFs  involved  in  these  small  systems.  The  extent 
of  overall  correlation  can  be  assessed  by  the  ('i-norm  of  the  correlation  matrices, 
about  28  versus  about  21  for  water,  and  about  315  versus  about  168  for  alanine 
dipeptide. 
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Fig.  10  Top:  Heat  maps  of  the  DoF  correlations  of  a  water  dimer  in  vacuum.  Bottom:  Heat 
maps  of  the  DoF  correlations  of  alanine  dipeptide  in  vacuum.  Left:  Correlations  of  the  carte¬ 
sian  DoFs.  Right:  Correlations  of  the  wavelet  DoFs. 


A  further  indication  of  the  appropriateness  of  the  diffusion  wavelet  DoFs,  is  the 
fi -norm  of  structures  and  forces  in  a  given  representation  basis.  The  C  -  norm  is 
bounded  by  the  f2-norm  for  any  orthonormal  basis  and  there  exists  at  least  one 
coordinate  system  in  which  ||x||i  =  ||rc || 2-  These  coordinate  systems  optimally  de¬ 
scribe  a  state  x. 

We  compared  the  fi-norm  in  the  Cartesian  coordinate  system,  the  nonlossy  wavelet 
coordinate  system,  and  the  optimal  coordinate  system  on  each  scale.  For  alanine 
dipeptide,  a  reduction  by  a  factor  of  about  2  was  achieved  by  switching  to  wavelets 
(3  times  the  optimum),  while  the  optimal  wavelet  system  managed  to  reduce  to 
twice  the  optimum.  For  polyglucose,  the  full  wavelet  DoFs  reduce  the  C-norm  to 
half  of  the  Cartesian  coordinate  system,  and  the  optimal  wavelet  system  reduces  it  to 
within  20%  of  the  optimum.  For  crystalline  PE,  the  full  wavelet  DoFs  improve  the 
structural  fx-norm  by  a  factor  of  about  7,  while  the  optimal  wavelet  representation 
comes  in  below  1%  of  the  Cartesian  coordinate  system  to  a  factor  of  roughly  twice 
the  optimal  representation.  Similar  results  were  obtained  for  amorphous  PE  and 
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nanocellulose. 


2.4  Adaptive  Multiresolution 

In  the  previous  sections,  our  goal  was  to  identify  scales  of  the  physical  system  to 
remove  unimportant  CAMs,  thereby  increasing  time  step  size  and  reducing  DoFs. 
However,  for  some  phenomena,  such  as  phase  transitions  like  melting  of  a  polymer, 
fine  details  that  are  unimportant  at  one  point  in  time  can  play  a  major  role  at  another. 
So  we  now  turn  to  the  problem  of  reconstruction,  that  is,  reintroducing  dropped 
CAMs.  We  find  that  reconstruction  is  systematically  possible  for  numerical  (e.g., 
derived  from  an  iterative  Boltzmann  inversion)  as  well  as  analytical  (where  such 
exists,  e.g.,  quadratic  potentials)  coarse-graining  hierarchies. 

2.4.1  Reconstruction  Theory 

We  begin  by  putting  coarse-graining  into  a  wider  context.  In  general,  a  coarsening 
7  :  a  — >  /3  is  a  continuous  surjection  between  2  topological  state  spaces  a,  the 
fine  space,  and  /3,  the  coarse  space,  which  can  be  parameterized  by  n  >  m  state 
variables,  respectively.  As  an  example,  one  can  consider  mapping  the  positions  of 
groups  of  particles  to  their  centers  of  mass.  In  statistical  mechanics,  the  fine-grained 
state  space  a  is  associated  with  a  probability  measure  Pa  :  {X  C  a}  — *  [0, 1]  and 
a  corresponding  probability  space.  For  systems  in  equilibrium,  a  would  further  fol¬ 
low  a  Boltzmann  distribution.  The  coarsening  7  thereby  induces  a  probability  space 
on  the  coarse  states  in  /3  as  well,  with  the  probability  measure  Pp(k)  =  Pa{^~l{k)), 
where  k  C  (3  and  7 _1(fc)  C  a  is  the  preimage  of  k.  Pp  constitutes  the  mean  proba¬ 
bility  distribution  of  the  coarsened  DoFs.  It  is  thus  possible  to  select  (reconstruct)  a 
precursor  for  a  state  b  e  /3  by  sampling  7-1  ({&})  with  Pa  via  the  conditional  proba¬ 
bility  P{a\b)  =  Pa({a}  r\'y~1(b))/Pa('y~1(b))  (e.g.,  using  Monte-Carlo  sampling). 
Since  n  >  m,  there  is  also  a  complementary  coarsening  7 -1  :  ct  — >■  ker  7  with  its 
complementary  probability  measure. 

In  MD,  the  state  spaces  consist  of  the  positions  and  their  associated  momenta  and 
thermodynamic  state  variables,  such  as  temperature  or  pressure.  The  probability 
distributions  are  Boltzmann  distributions  that  depend  on  the  studied  thermodynam¬ 
ical  ensemble.  In  a  sequence  of  coarsenings  (7™  :  /3n  — >  /3n+i)n  spanning  several 
scales,  such  as  derived  from  the  preceding  multiresolution  scheme,  it  is  generally 
not  cost  effective  to  sample  fully  in  the  largest  space  a  =:  (30  and  analytical  deriva¬ 
tions  for  Ppn  are  rarely  available.  Hence,  approximations  need  to  be  made.  Com- 
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mon  solutions  in  the  MD  community  are  probability  measures  from  iterative  Boltz¬ 
mann  inversions  or  (successive)  force  matching  to  generate  effective  potentials  that 
are  themselves  Boltzmann-distributed.  Hierarchical  iteration  thereby  produces  not 
only  probability  distributions  on  the  coarser  space,  but  also  conditional  probability 
distributions  for  a  fine  state  mapping  to  a  coarse  state.  Furthermore,  the  probabil¬ 
ity  distributions  can  be  used  to  indicate  when  a  previously  undersampled  coarse 
state  subspace  is  encountered  at  some  state  x  (e.g.,  using  an  expected  improvement 
measure30  or  a  sensitivity  analysis  of  the  potential  —  In  Pp(x)  with  respect  to  the 
sampled  points  for  which  the  trust  boundaries  can  be  precomputed). 

2.4.2  Coarse-Graining  Hierarchies 

In  the  context  of  MD,  mixed  resolutions  are  necessary  for  the  concurrent  treatment 
of,  for  instance,  gross  rigid  protein  orientations  and  flexible  active  site  residue  inter¬ 
actions  with  a  binding  molecule.  State  spaces  with  mixed  levels  of  fine  and  coarse 
CAMs  arise  naturally  from  the  multiresolution  scheme  laid  out  in  Eqs.  9  and  10 
(see  Section  13  and  Figure  11  for  details). 


Fig.  11  Dyadic  tree  generated  by  separable  coarsenings.  Shaded  spaces  are  final  subspaces  of 
the  multiresolution.  All  other  spaces  are  intermediates  of  the  construction  with  71  =  /za  ®  //), 
and  72  =  Ha®  Vb- 


The  sequence  of  coarsenings  (yn  :  Vn  — *  Vn+i  )n  generates  a  full  dyadic  tree,  due 
to  the  complementary  coarsenings  7^  (i.e.,  each  state  space  can  be  split  into  ker  77 
and  ker7n).  A  sequence  of  separable  coarsenings  (7,,,-  IXi-  1‘b,,)  thereby  in¬ 
duces  a  hierarchy  of  coarsenings  and  associated  probability  measures,  induced  as 
described  previously. 
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2.4.3  The  Wavelet  Hierarchy 

A  minimal  set  of  separable  coarsenings  that  generates  a  full  given  hierarchy  is 
fundamental  and  characterizes  said  hierarchy.  One  such  sequence  for  coarsenings 
based  on  CAMs  for  Vm  =  (®ili  ^  W2mn_ is 


I  2 -m.-lN  I  _  I  2~m~2N  I 

/Mm  '■  ®n=l  Wi™n- 1  — >  ®n=l  W2m+in- 1, 

zrsL2_m_l7VJ 

A4 Bm  =  ®n=L2-m-1iVJ+l  ©n=L2-m-27VJ+l  ^2m+1n-l, 


(24) 

(25) 


where  N  is  the  number  of  particles  in  the  finest  resolution. 


We  note  that  these  intermediate  probability  distributions  are  available  both  analyt¬ 
ically  and  numerically,  since  an  accurate  fundamental  coarsening  has  to  include 
proper  statistics  for  the  intermediate  state  space  to  be  consistently  sampled.  Recur¬ 
sive  application  of  conditional  probabilities  enables  concurrent  mixed  resolutions. 
Since  the  construction  of  CAMs  from  the  multiresolution  analysis  produces  a  hier¬ 
archy  of  frequencies,  it  induces  a  hierarchy  of  coarsenings  by  dropping  successively 
higher-frequency  CAMs  (i.e.,  by  applying  the  low-pass  filter  Pe(T2")). 

2.4.4  A  Priori  Approximations  for  Reconstruction 

Implementation  of  reconstruction  algorithms  as  discussed  above  requires  a  starting 
point.  In  the  following,  methods  are  proposed  for  finding  good  starting  points  based 
on  A  and  other  molecular  information  that  is  available  prior  to  simulation  (i.e., 
without  the  need  for  MD  or  Monte  Carlo  sampling).  To  second  order,  a  quadratic 
potential  around  the  equilibrium  positions  of  the  transformed  coordinates  approx¬ 
imates  the  full  potential.  We  assume  dominance  of  harmonic  terms,  both  in  the 
orginal  and  transformed  basis  of  the  potential, 

Vhond(M~WJr)  ~\(r-  ?0)TUAUT(r  -  f0),  (26) 

where  r  =  UMl^2x  is  the  position  vector  in  the  wavelet  basis,  and  UM 1//2  is  the 
wavelet  transformation  matrix.  From  statistical  mechanics,  the  root  mean  square 
deviation  from  equilibrium  of  a  harmonic  oscillator  is  \/kT/  A,  where  A  is  the  force 
constant.  In  other  words,  the  higher  frequency  components  are  found  increasingly 
close  to  their  energy  minima.  This  implies  that  finer  scales  only  have  small  devi¬ 
ations  from  their  equilibrium  positions,  while  coarser  scales  may  access  a  much 
larger  space. 
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We  start  by  approximating  ||  X  'i  X  j  ||  by  a  Taylor  expansion  around  rf^.  This  trans¬ 
forms  the  bond  potential  Vbond  into 


Abond  ~ 


(27) 


Hence  to  find  equilibrium  distances  for  f  —  U A7l  /2a:,  we  solve  the  minimization 
problem 

min  ^2  Cii  (ll  (M~^UTr0)i  -  (M~^UTr0)j\\2  -  rf^  ,  (28) 


where  Qj  =  K^/rff  *. 


For  example,  the  2  nonzero  eigenvalues  of  H20  correspond  to  a  unique  solution  for 
reconstructing  H20.  The  harmonic  Laplacian  for  H20, 


/  2 Kqh 
/  m0 


Kh2o  — 


V 


Kqh 

1 

mo 

Kqh 

mo 


Koh 

T~ 

mo 

Kqh 

0 


mg 

0 

&OH 


(29) 


shares  the  same  structure  as  CH2.  The  eigenvalues  A0,i,2  of  this  simple  matrix 
are  0,  KOH,  and  (1  +  2 /m0)K0H,  respectively,  with  corresponding  eigenvectors 
(■ rriQ 2, 1, 1),  (0, 1,  —1),  and  (2mg^2,  —1,  —1).  The  transformation  matrix  from  f  to 
x  is 


W 


V 


: 


0 

1 

-1 


2  m  q2 


A 


C (mo  +  2)  2 

V 


(2  +  4/mo)  V 
(30) 


The  corresponding  harmonic  potential  for  water  is 


Vh2o(xo,xi,x 2)  =  -iTorr  (||x0  -  xi||  -  r0rr)2  +  K0h  (||^o  -  x2\\  -  rOHf  ■ 

(31) 


Approved  for  public  release;  distribution  is  unlimited. 


23 


This  attains  its  minimum  when 


IK-xif  =||f2||2  (l  +  mo)  ~  y/1  +  2m01rlr2  +  ^\\ri\\2  =r2OH,  (32) 

11*0  -  x2\\2  =  |N|2  Q  +  mo1)  +  y/l  +  toijffa  +  i||ri||2  =r2OHl  (33) 

where  we  have  used  that  x0— x \  =  f2\/l/2  +  l/mo—ri/V2,  xq—x2  =  f2y/ 1/2  +  1  /mo+ 
h/V2.  Hence,  f\  and  r2  must  be  perpendicular  to  each  other  (subtracting  Eqs.  32 
and  33)  and  ||r2||2  (1/2  +  1/mo)  +  ^ll^i||2  =  roH ■  As  these  solutions  are  exact,  the 
hydrogens  are  always  at  a  distance  of  toh  from  the  oxygen.  Since  there  is  no  angu¬ 
lar  potential,  the  solution  is  indeterminate  with  2  extreme  solutions,  the  first  being 
||f2||  =  0.  In  this  case,  the  molecule  is  linear,  while  for  1 1 r x 1 1  =  0  the  2  hydrogens 
overlap. 


Another  instructive  example  is  HCN.  Its  harmonic  Laplacian, 


A 


HCN 


/ Kch+Kqn 
me 
KCh 

y/mcmH 

I<CN 
\  sj mcmN 


Kch 


y/mcmH 

Kqh 

mH 

0 


Kcn  \ 


y/mcmN 

0 

Kqn 

mjv 


(34) 


is  no  longer  as  simple  as  for  H20,  nor  are  the  eigenvalues  except  0  simple  functions 
of  the  variables  in  Eq.  34;  for  the  generalized  amber  force  field  (GAFF),  they  are 
14.7  and  42.8.  Eq.  28  was  numerically  solved.  The  numerical  GAFF  CAM  distances 
in  one  dimension  are  (fj°\  f^)  =  (3.26,0.66)  and  r[u)  =  2.61,  =  —1.32. 

Knowing  that  the  molecule  is  linear  at  equilibrium  selects  the  first  solution  to  re¬ 
construct  the  equilibrium  structure. 


In  both  examples  it  was  necessary  to  include  angle  information  to  make  the  best 
choice.  The  numerical  solution  to  Eq.  28  can  be  computed  efficiently  using  a  va¬ 
riety  of  nonlinear  least-squares  algorithms,  but  more  direct  methods  are  still  under 
investigation.  Similar  derivations  are  possible  for  angle  potentials  and  under  current 
investigation. 
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3^  Operator  Wavelets  Theory 


To  perform  efficient  wavelet  representations  of  DoFs,  there  also  needs  to  be  an 
appropriate  effective  operator  that  can  exploit  the  wavelet  structure  of  the  CAMs. 
Ideally,  this  operator  can  itself  be  constructed  in  a  hierarchical  fashion.  The  follow¬ 
ing  explores  the  properties  of  operator  hierarchies  that  arise  from  doubling  and  shift 
operators.  Repeated  application  of  an  invertible  doubling  operator  D  \  (  — >•  2*fi 

and  an  invertible  shift  operator  a  :  2z?i  — *  Jzfi  on  an  operator  space  Jzfj  generates  a 
scaling  operator  <3>  analogous  to  a  scaling  function  in  wavelet  theory  on  functions. 
Then, 


D(J  +  a)$  =  $.  (35) 

Since  the  identity  is  subject  to  D  and  Da,  there  are  associated  operators  d  G  2z?i 
and  ds  G  2z?j,  respectively.  If  furthermore  D  and  a  are  endomorpisms  on  I£\ ,  then 

(d  +  ds)$a  =  ( DI  +  DaI)(D&  +  Da$)a  =  <J>a  (36) 

( d  +  ds)a  =  a  (37) 

for  any  eigenfunction  a  of  <I>.  Hence  the  eigenfunctions  of  <f>  are  scaling  functions 
under  the  doubling  operator  d  and  shift  operator  s.  The  corresponding  operator 
wavelets  are 


=  DnakD(I  -  a)<T,  (38) 

ipa,n,k  =  dnsk(d  —  ds)a.  (39) 

An  approximate  procedure  to  develop  a  hierarchy  that  is  not  based  on  endomor¬ 
phism  on  Jzlj  is  to  use  the  doubling  operator  fI>  2<f>  and  use  polynomial  filtering 
on  the  spectrum  (i.e.,  $  (->■  F{(1>)  =  Ft(F  where  the  polynomial  F  approxi¬ 
mates  a  low -pass  filter).  Under  this  construction  any  scaling  operator  is  unitarily 
equivalent  to  any  other  (i.e.,  they  share  the  same  spectrum). 

4.  Trotter  Factorization  for  Use  with  Operator  Wavelets 

A  hierarchical  decomposition  of  the  infinitesimal  generator  of  time  propagation  (Li- 
ouville  operator)  into  operator  wavelets  enables  the  separation  of  time  scales  rigor¬ 
ously  using  Trotter  factorizations  for  numerical  propagation.  We  split  the  Liouville 
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operator  into 


N 

m’=y,h'+  Y  (4°) 

i=0  i,je{0,...,N} 

where  Hi  are  individually  solved  subsystems  and  Hij  are  the  interactions  between 
these,  such  as  the  operator  wavelets  introduced  in  Section  3.  Preferably,  these  parti¬ 
tions  coincide  with  inherent  time  scales  t,  and  tij.  Let  the  time  scales  be  in  ascend¬ 
ing  order  and  readjusted  to  the  closest  integer  multiple  (i.e.,  tI+ 1  =  if) 

and  r  =  max  tj.  If  ti+\  =  tj,  it  is  useless  to  separate  II f+i  and  II i.  So  then, 


ei^r 


N  If  0 

nM  n 


7=0 


JHth/ 2  A  T/tl 


I=N 


(41) 


When  the  time  propagator  elHltl  has  analytical  solutions,  such  as  with  harmonic  os¬ 
cillators,  then  the  analytical  solution  ei/7,r/2  can  be  used  instead  of  the  r/tj  applica¬ 
tions  of  the  approximation.  If  r/tj  is  large,  considerable  savings  may  be  expected. 

In  particular,  consider  a  quadratic  potential  with  matrix  A  (i.e.,  iMJq  =  —  (x  — 
^)TA4+PTM-^),  then 

The  middle  term, 


can  be  solved  analytically,  so  any  time  interval  r  can  be  computed  exactly,  whereas 

M’-M’q  =  (-VTV+  (x-xfA^j  Y  (44) 

requires  only  a  single  force  evaluation  similar  to  other  reversible  reference  system 
propagator  algorithms. 


To  leverage  this  factorization,  an  appropriate  origin  x  needs  to  be  assessed  either 
by  running  short  bootstrapping  trajectories  or  analytical  expressions  specific  to  the 
system.  For  high  frequencies,  the  propagation  by  Mq  may  wildly  oscillate  through 
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the  origin.  Which  origin  is  optimal  may  actually  depend  on  the  amount  of  energy 
in  the  quadratic  term. 


5.  Solutions  to  "Harmonic"  Oscillators 


The  Trotter  factorization  in  Section  4  is  most  useful  if  analytical  solutions  exist  for 
the  involved  operators.  The  Liouville  operator  ££  for  bond  oscillators  in  MD  is 


=  ~kii  0^  -  -  ri?) 

ij 


Xi  —  Xj  d 
Xi  -  xj  ||  dpi 


Pi  d 
rrii  dxi 


(45) 


We  set  rj  =  Ex,  where  E  is  the  incidence  matrix  (i.e.,  r/  are  the  edges  of  the  graph 
spanned  by  the  bonds).  Furthermore,  let  ET6  =  p.  Then 


V:c  =  £tV„,  (46) 

EVP  =  V*.  (47) 


Inserting  these  into  Eq.  45,  gives 

iSf  =  V -~ki(\\rn\\  -  rf)^v0/  +  6T  EM~l  ETV  n.  (48) 

i  M 

Assuming  a  simple  ansatz  that 

-  ki(\\vi\\  ~  r^jr^VeJi  +  Oj E^M^E^W mfj  =  Xjfj,  (49) 

then 

iSfg(ri,  °)  =  Ot(EM~1Et  -  D)Vvg  +  A g,  (50) 

where  g  =  Y,  dfiivi,  Si)  and  A g  =  d^ifi(r)i,  Si). 

The  Liouville  operator  is  rotationally  invariant  (i.e.,  Jtf  commutes  with  any  ro¬ 
tation  R(4>)).  Hence,  there  are  simultaneous  eigenfunctions  of  ££  and  rf  RS7  r]  + 

9TRVg,  where  R  =  ^  ^  Eigenfunctions  are  compositions  of  (1  ±  iR)rj  and 

(1  ±  R)0  with  eigenvalues  ±i. 

The  following  equations  are  some  helpful  identities  that  were  used  in  the  previous 
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derivations: 


cos  0  sin  0 

r)  =  Vr  \  .  ,  I  v 

■  sin  0  cos  0 , 


9  =  6r 


cos  0  —  sin  ( 


sin  0  cos  0 
v„  =  7 7ri?(0)TVJ?  +  9rR((j))Vg 

/-  =  vT  R(4>)T\/ n 

(ITjr 

wrvTmv° 


-k(vr  ~  ro)nTi?T(0) V@  +  0ruTi?(0)MV^ 


(51) 

(52) 

(53) 

(54) 

(55) 

(56) 

(57) 


6.  Hermite  Functions  for  Approximating  Potentials 

In  the  decomposition  of  the  Liouville  operator  in  Sections  3  and  4,  a  suitable  basis 
of  potential  functions  is  needed.  This  basis  is  preferably  sufficiently  localized  to 
avoid  the  need  for  global  integrals.  The  Hermite  functions, 

_1  1^2  dn  _  2 


=(-l)»  (2"n!v^)  5e3s  —  e 
=(-lf  (2 nn\V^)~hHn(x)e^x2, 
^2n  =(~l)2n  (22n2 ^ 


i  j2n 

-f  i^2  a  ^__x2 
1^2 


=(-l)* i 2"  (22n2n!v^f)  3  H2n(x)e-2X  , 


(58) 


(59) 


are  an  orthonormal  complete  set  of  basis  functions  that  obey  a  3 -term  recursion 
formula, 


0n+l(x)  = 


n  +  1 


2  I  Tl 

xi/jn(x)-A  — — 0v_  i(x),  (60) 


n  +  1 


(2a;2  —  An  —  1) 

02n+2  =  — /.  '  .  =02n 


2n(2n  —  1) 


~1p2n—2- 


(61) 


^(2n  +  2)(2n  +  1)  y  (2ra  +  1) (2 n  +  2) 

The  recursion  formula  enables  a  fast  transform  of  functions.  Since  "harmonic"  po 
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tentials  are  merely  functions  that  capture  3  aspects  of  bonding,  it  is  possible  to  use 
Hermite  functions  in  their  stead.  The  boundary  conditions  are 


/( 0)  =rl-D, 

(62) 

(63) 

f{'r  0)  =  —D, 

(64) 

^l-k 

dx2 

(65) 

f(x)  = 

(66) 

where  D  is  the  dissociation  energy,  k  is  the  force  constant,  and  r0  is  the  equilibrium 
distance.  These  conditions  result  in  a  linear  system  for  coefficients  in  an  expansion 
of  Hermite  functions, 


/  = 


(67) 


A  simple  decomposition  converts  this  equation  into  the  coefficients  of  the  Hermite 
functions  of  scale  a !  The  Liouville  operator  for  such  a  potential  would  be 


-  XjW)—  -  Vjf(\\xi  —  +  mi  1Pi^+mj  ]pJ XA 


d p, 

/  +  (  2/3  +  47 

_ 1  rr  d  _ 1  rp  d 


d  Pi 


3  3  dXn 


X2  OC  j 


a 


OC  2  X  j 


a 


1  xi~xj 

'2  a 


l 

d  d 

d Pi  d Ipj 

(68) 


A  further  point  of  note  is  the  fact  that  this  expansion  only  includes  quadratic  terms 
with  respect  to  the  DoFs,  which  allows  much  simpler  decomposition  into  compo¬ 
nent  terms  after  linear  transformation  of  the  DoFs  into  CAMs.  Finally,  geometries 
may  be  defined  by  distances  obviating  the  need  for  angular  and  dihedral  potentials. 
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7.  Koopman  for  MD 


In  the  following,  the  relationship  between  operator  wavelets  and  the  Koopman 
modes  of  the  Liouville  operator  is  demonstrated.  The  Liouville  operator  22?  is  a 
Koopman  operator  for  MD  systems.  Let  x(x,p)  =  X]{a}  (x,p),  where  Kx  is 

an  eigenfunction  of  22?  and  x  :  (x,  p)  (->•  x.  Then, 

x(t)  =  e~t^'tx(x0,p0)  =  y 'elXjtxXiKx.  ( x0,p0 ),  (69) 

where  A j  is  the  corresponding  eigenvalue  to  KXj.  A  decomposition  of  using 
an  operator  hierarchy  as  in  Section  3  naturally  separates  the  scales  by  the  wavelet 
operators.  Therefore,  KXj  is  in  the  kernel  of  all  nontrivial  operator  wavelets  save 
one.  The  eigenvalues  A  j  are  generally  degenerate.  Grouping  eigenfunctions  KXk 


with  the  same  eigenvalue 

x(t)  =  ^2  elXj  ^2  Xj,kKjAX 0>  Po)  =  ]2  AielXj  ’  (7°) 

reduces  the  number  of  actually  free  variables,  where  A3  are  the  effective  free  vari¬ 
ables.  Assume  2  observables  evolve  in  time 

x(t)  =  e~lJietx(xo,Po)  =  ^2etXjtxXjKXj(x0,p0),  (71) 

y(t)  =  e~lJZty{y<h  %)  =  ^2 f  ,A  '//a  /^'a, (2/0,  go)-  (72) 

Then,  cross-correlation  indicates 

(Av)  =  E  x*x/y\:iKXj  ( x0 , Po)Kx.  (y0,  go).  (73) 


It  follows  that  the  eigenvectors  of  the  correlation  matrix  of  a  set  of  observables  is 
the  optimal  linear  representation  of  same  observables.  This  includes  the  original 
DoFs  as  well  as  CAMs. 

We  can  further  characterize  the  eigenfunctions  Kx  of  22?  for  physical  systems  by  its 
distributive  property, 


(74) 
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Hence,  any  power  of  an  eigenfunction  of  SA  is  itself  an  eigenfunction  of  SA  via 

n—  1 

SAK\  =  A Kx  =>  K™(&Kx)Kl~m-1  =  ri\Knx.  (75) 

m= 0 

This  motivates  a  simple  shift  operator  Kx  1  and  a  doubling  operator  D  :  f  ( Kx )  i->- 
f(Kx),  where  /  is  a  formal  series  in  polynomial  powers  of  Kx. 


The  effect  of  the  Koopman  operator  can  be  captured  by  bootstrapping.  With  an 
initial  trajectory,  the  effective  frequencies  can  be  found  by  solving 


N 

min  ) 


M 


x(nAt)  — 


eiWjnAt~ 


Xj 


(76) 


where  N  is  the  number  of  time  steps  in  the  initial  run,  At  is  the  time  step,  M  is  the 
number  of  frequencies  to  capture,  and  C  is  a  cost  factor  to  control  overfitting  with 
regularizer  ^||xjj|.  By  comparison  with  Eq.  71,  Xj  ~  xWj.  Then  Ax  :=  x(t)  — 
elWjtXj  is  the  residual  due  to  ASA  Jz?  —  S£ ',  where  S£  is  the  linear  operator 

with  eigenfunctions  and  eigenvalues  ( KWj,Wj )  such  that  /  0 V/  ^  span  {KWj}. 

It  follows  that  SA  and  ASA  are  orthogonal  operators  and  commute.  Thus 


lift  _  iAJft+iS’t  _  iAJft  iJCt  _  iA’t  iAJft 

c  c  c  c  c  c 


(77) 


At  time  t,  the  action  of  ASA  on  x  is  A p  :  =  p(t)  —  Y  iWjXjelWjt ,  whereas  the  action 
on  Ap  is  —VI/ ( x[t ))  +  Y  w2jXjelWjt. 

Using  the  usual  Velocity- Verlet  factorization  going  from  time  t  —  0  to  t  —  At, 


A^i  = 

A  p2  = 
Ax3  = 
x(A  t)  = 
p(At)  = 


Ap0At/2  +  Aa:0,  (78) 

A-Po  —  ( VU (Axi  +  x(At/2 ))  —  p(At/2))  t,  (79) 

Aaq  +  Ap2t/2  =  Ax0  +  (A p0  +  A p2)  t/2,  (80) 

Ax3  +  x(At),  (81) 

Ap2+p(At).  (82) 


This  approach  is  equivalent  to  introducing  new  variables  a?  and  /3j  :=  a3  and 
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changing  the  Liouville  operator  to 


=  tr 


(VTXV)  (x  +  J2  aj)  Vp\  +  trpTM~1Vx  +  ^  /f  Vc 


-  tfai  V 


ft- 


(83) 


8.  Derivation  of  Wavelet  Spaces 

The  wavelets  are  derived  iteratively  from  a  filter  T  using  the  QR  decomposition.  At 
each  iteration,  T  is  cast  in  the  “Q”  basis  of  the  previous  iteration, 

T2n+1  \  =Qn+lRn+l-  (84) 

Therefore,  repeated  application  (with  infinite  precision)  is  equivalent  to  the  QR  al¬ 
gorithm  for  finding  eigenvectors.  Since  T  is  positive  definite  and  \\T\loo  =  1,  the 
squaring  introduces  a  de  facto  projection  operator  P£  via  the  machine  precision.  It 
is  this  projection  that  distinguishes  a  conventional  QR  algorithm  for  finding  eigen¬ 
values  and  eigenvectors  from  the  wavelet  decomposition  into  CAMs. 


We  separate  Qn  into  a  low-frequency  submatrix  and  a  high-frequency  submatrix 
T„,  where  the  latter  are  the  columns  Qn  t  of  Qn  for  which  Q^fT2" Qni  <  e.  The 
matrix  <!>„  collects  the  remaining  columns  of  ()n.  Thus, 


T 


rp  2 


n+1 


ilQnR„Rl<&r, 


(n*h^(n*o- 

2—0  2—0 


9.  Error  Bounds  on  Scales 

The  contamination  of  the  wavelet  spaces  Wn  by  eigenvectors  U>  of  larger  eigen¬ 
values  A2"  >  e  of  the  filter  operator  T  is  limited  by 

c(n)  .  ^  ^ n,max  /oc\ 

oT  ,  (p5) 

^ n,min  £ 

where  urhmin  =  min  {A2"  >  e|A  G  cr(T)},  un.max  =  max{A2"  <  e|A  G  ct(T)}, 
and  cr(T)  denotes  the  spectrum  of  T.  In  classical  wavelet  theory,  wavelets  are  lo¬ 
calized  both  in  real  and  reciprocal  space  (e.g.,  the  Fourier  transform  of  the  Mexi- 
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can  hat  wavelet  transform  Wf(x)  =  —<j~2A  J  f  (y)dy  is  W f(u>)  = 

uJ2(j-  2e~uj2^4'7'2'1  f(uj)).  The  sensitivity  Sj‘ 1  of  T  at  scale  n  measures  the  localization 
of  Wn  in  the  frequency  domain.  In  practical  terms,  small  Sj" 1  implies  that  the  bases 
(l>n  and  T„  are  generally  sparse  matrices  if  T2"  is  sparse,  and  the  wavelet  transform 
can  thus  be  computed  efficiently. 

10.  Invariant  Subspaces 

A  is  positive  semi-definite  and  AjjM1/21j  =  0  for  each  set  of  indices  J,  where 
A  jj  is  the  square  submatrix  of  A  with  indices  in  J,  and  lj  is  the  vector  of  ones 
on  indices  in  J  and  0  otherwise.  It  is  possible  to  block  tridiagonalize  A  using  only 
transpositions  with  diagonal  blocks  A  jyj  and  off-diagonal  blocks  A  jj<,  where  J 
and  K  are  disjoint  index  sets  and  J,  K.  J  U  K  are  contiguous  index  sets.  Without 
loss  of  generality,  we  let  j  <  k'x/Jj  <E  J,  k  e  K.  The  rank  of  Ajk  is  generally 
low  due  to  the  low  maximum  degree  of  a  vertex  in  A.  If  A  =  span  A^j  and  A  jj 
has  a  nontrivial,  invariant  vector  space  T  perpendicular  to  A,  then  T  is  localized 
to  indices  preceding  K .  Examples  include  linear  homopolymers  discussed  in  de¬ 
tail  in  Section  2.1.4  and  Section  12,  but  also  disconnected  graphs  from  individual 
molecules.  Invariant  subspaces  {T}  are  computationally  convenient  because  they 
allow  a  separation  of  the  problem  into  independent  smaller  subproblems. 

11.  Perturbation  Theory  for  Molecular  Fragments 

Changing  the  mass  of  atom  i  by  8  leads  to  M'~^2  =  M-1/2  +  Se^ef,  and  similarly 
A  becomes 


A'  =(M"3  +  5eieJ)A(M “2  +  Saef)  (86) 

=A  +  8mf  (e-ief  A  +  Ae^ef )  +  52miAiieieJ.  (87) 

For  an  eigenvector  v  of  A  contained  in  some  wavelet  space  Wn  with  nondegen¬ 
erate  eigenvalue  rju,  the  first-order  correction  to  the  eigenvalue  is  mlJ28\vi\2{2r)v  + 
mY2Au8),  and  the  first-order  correction  to  the  eigenvector  is  //  v)/i,z/,rn?l/,2(  //,y+ 

r/M  +  8mrAii) / (2r//;  —  2 rjv),  where  {/i}  is  an  orthonormalized  set  of  eigenvectors  of 
A.  Hence  the  scale  of  v  is  unaffected  if  /a  2  is  0  or  sufficiently  negligible.  Correc¬ 
tions  to  the  eigenvector  are  only  of  significance  if  some  eigenvector  f  t  outside  of 
Wn  additionally  has  a  significant  component  //,. 
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By  a  similar  argument,  first-order  corrections  can  be  computed  for  a  perturbation  of 
the  edge  weights  (e.g.,  changing  bond  spring  constants), 


A'  =A  +  SM  2  (e*ef  +  ejeJ  -  e-iej  -  ejeJ)M  2 , 
_  1  _  1 

rjW  =5\mi  2 Vi  -  m-  2Vj |2, 


!/«  = 


E* 


(■ ntj  2/i*  —  mj  2 2 Vi  -  rrij  2vj) 

Vn  ~  rb 


-V- 


(88) 

(89) 

(90) 


Hence,  a  wavelet  subspace  Wn  only  changes  if  3v  G  Wn  :  ||TV||2  +  5||^||oo  > 
£1/2"  or  \\Tv\\2  -  5|M|oo  <  £1/2n~\ 


12.  Derivation  of  Homopolymer  Wavelets 

The  computation  of  wavelet  spaces  for  linear  homopolymers  with  n  repeat  units  can 
be  subdivided  into  n  subproblems.  The  recursive  structure  of  linear  homopolymers 
implies  that  A  for  a  linear  homopolymer  of  n  repeat  units  can  be  reordered  by  a 
permutation  k  of  indices  such  that  kt  Ak  —  A  <g>  In  +  B  <S>  (E^  +  enej)  +  BT  ® 
(En  +  e  i  ) ,  where  In  is  the  nx  n  identity  matrix,  and  En  is  the  n  x  n-matrix  with 
all  ones  on  the  first  subdiagonal  only  and  zeros  elsewhere.  Then,  kt Ak  takes  the 
simple  form 

^  -^lji-Zn  A\2In  ■  ■  ■  A\mIn  T  B\  m(y En  T  eien)^ 

-'4.2,1  In  42j2 In  '  '  '  -41)Tra/n 

\Am,lIn  T  B^  m(Yjn  T  ene1  )  ■  ■  Am, min  ) 

(91) 


Let  U  =  yTn)fci  be  a  unitary  matrix  that  diagonalizes  Ai)m/n+I?i)m(E^  + 

enef),  then  U  :=  Im®U  transforms  ktAk  into 


^Aiiln 

Alfiln  ' 

D*  \ 

A2,Jn 

A2,2ln  ' 

A,  T 

■ril  ,m±n 

\  D 

Amt2In 

=  UV  A«U, 


(92) 


where  D  is  a  complex  diagonal  matrix  with  eigenvalues  Dkj.  =  A^m  +  B i}melk77^m. 
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Applying  the  similarity  transform  by  kt  produces  a  block  diagonal  matrix  (kU*kt 
AkAJkt)  with  square  blocks  A{  =  A  -  (Ai,m  +  A,*)e iem  ~  (Ai ,m  +  -D*,i)emef . 
Hence,  the  computation  of  the  wavelet  spaces  has  been  reduced  to  n  problems  of 
size  m  instead  of  one  problem  of  size  nm.  Furthermore,  any  eigenvector  v  of  A 
with  e^v  =  eju  =  0  has  an  n-fold  degenerate  eigenvalue. 

13.  Derivation  of  Mixed  Resolution  from  Separable  Coarsenings 

Let  :  Wn'n  — >  (3n  denote  a  parameterization  of  the  mn -dimensional  state  space 
l3n  (e.g.,  the  positions  and  momenta  of  particles).  A  coarsening  yn  can  be  separated 
into  components  if  there  exist  parameterizations  (3^  :  An  ©  Bn  — >■  /3n  and  j3^+1  : 

An+ 1  ©  Bn+ 1  ->■  /3n+ 1 ,  such  that  An  ©  Bn  =  Rmn ,  An+i  ©  Bn+1  =  Mm"+1 ,  and  there 
exist  continuous  surjective  mappings  iiAn  :  An  — >  An+i  and  nBn  :  Bn  —>  Bn+1 
with 

=7n1(^n+i  (tA,tB))  (93) 

for  tA  £  An+i,ts  €  fjn+i.  ii a  and  fiB  map  a  fine  parametrization  to  a  coarse 
parameterization.  An  example  parametrization  is  the  representation  of  Vn  as  Vn+  \  © 
Wn+\  with  fine-to-coarse  mappings  as  per  Eqs.  9  and  10.  See  Figure  11. 

A  separable  coarsening  for  which  neither  fiAn  nor  fxBn  are  bijections  induces  inter¬ 
mediate  coarsenings.  The  state  space  An  x  Bn+i  is  an  intermediate  state  space 
with  the  coarsenings  7  :  (3Tn{tAn,tBn)  ^  (tAn,BBn(tBn))  G  An  x  Bn+1  and 
y  ■  (tA n,tBn+1)  ^  Pn+i(VAn(tAn),tBn+1)  G  /3n+i.  Finally,  mixed  resolution 
spaces  An  x  Bn+ 2  are  induced  via  function  composition,  7  :  PT(tAn,tBn)  i-A 

i^Anl  BBn+l  0  BBn{tBn)- 

The  reconstruction  from  a  fundamental  coarsening  can  be  achieved  by  reconstruct¬ 
ing  its  separable  components  separately  and  independently  via  the  conditional  prob¬ 
abilities  P(t  An  |  tAn+1 ,  tBn+1  )  =  Ppn(({tAn}  n  ii  aI  (tAn+1  ))xBbI  (*  b»+i  ) )  /P/Wi  (*  An+1 ,  t  B, 
The  same  can  be  achieved  for  the  complements  analogously. 


Approved  for  public  release;  distribution  is  unlimited. 


14.  Conclusions  and  Outlook 


We  have  characterized  a  coarse-graining  procedure  for  accelerating  molecular  sim¬ 
ulations  through  a  systematic,  hierarchical  algorithm  based  on  multiresolution  dif¬ 
fusion  wavelets.  The  proposed  wavelet-CG  approach  goes  beyond  conventional  ap¬ 
proaches  based  on  expert  knowledge:  because  our  proposed  method  can  acceler¬ 
ate  calculations  of  novel  classes  of  molecules  without  requiring  extensive  expert 
insight  and  model  parameterization.  This  advantage  is  especially  important  for  in¬ 
verse  problems  in  materials  design,  wherein  the  materials  engineer  aims  to  optimize 
material  performance  in  an  essentially  infinite  design  space  (the  chemical  space  of 
polymer  repeat  units).  Our  demonstration  of  the  perturbation  theory  for  chemical 
variations  in  the  repeat  unit  illustrates  this  key  advantage  for  the  wavelet  CG  ap¬ 
proach. 

Importantly,  these  advantages  are  obtained  in  a  framework  that  automatically  re¬ 
capitulates  the  physical  insights  underlying  existing  coarse-graining  methods  such 
as  united-atom  models.  On  the  other  hand,  diffusion-wavelet  CG  models  are  si¬ 
multaneously  more  general  (they  do  not  require  a  priori  expert  modeling  and  pa¬ 
rameterization)  and  more  specific.  In  fact,  the  diffusion-wavelet  CG  approach  leads 
to  system-specific  CG  models  derived  automatically  from  the  system’s  underlying 
bonding  topology  and  atomistic  force  field,  without  further  input  other  than  an  er¬ 
ror  tolerance.  The  systematic  and  purely  algorithmic  basis  offers  the  opportunity  for 
adaptive  error  control,  whose  obvious  importance  has  motivated  significant  analysis 
already 3 1-33 . 

Future  work  will  establish  the  relationship  between  time  steps  and  simulation  accu¬ 
racy34.  Currently,  the  bootstrapping  procedure  for  optimal  Koopman  mode  approxi¬ 
mation  requires  embedding  in  the  propagation  scheme  and  linking  to  stochastic  dy¬ 
namics.  The  operator  wavelet  approach  has  implications  for  developing  new  opera¬ 
tor  decompositions,  which  require  derivation  and  solution  of  analytical  subsystems. 
Meanwhile,  a  proper  effective  operator  decomposition  using  Hermite  functions  or 
conventional  harmonic,  angle,  and  dihedral  functions  as  a  basis  for  fitting  can  be 
achieved. 
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2.  Rinderspacher  BC,  Bardhan  JP,  Ismail  AE.  Diffusion  wavelet  decomposition 
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List  of  Symbols,  Abbreviations,  and  Acronyms 


CAM  collective  action  mode 
CG  coarse-grained 
DoF  degree  of  freedom 
GAFF  generalized  amber  force  field 

LAMMPS  Large-scale  Atomistic/Molecular  Massively  Parallel  Simulator 
MD  molecular  dynamics 
PCFF  polymer-consistent  force  field 
PE  polyethylene 
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